feat: Implement Block Streamer Bitmap Operations #747

darunrs · 2024-05-24T01:10:28Z

The bitmap indexer will return a list of bitmaps in the form of base 64 strings, and associated start block heights. We need a way to convert all that data into a single block height and an associated bitmap.

This PR introduces a new BitmapOperator class which holds all the operations necessary to perform the function of returning a combined binary bitmap with the lowest start block height as index 0.

morgsmccauley

Honestly, I have no idea what's going on here, mainly due to unfamiliarity with the algorithm used. It will take some time for me to understand, so rather than block you, I've provided the best review I can with my limited knowledge. I'm trusting that you know what's going on 😄.

If it's possible, it would be good to break up the code up in to more well-defined/cohesive methods, but that doesn't need to happen here.

There are a lot of clippy errors, could you run cargo clippy and fix please?

morgsmccauley · 2024-05-27T03:54:06Z

block-streamer/Cargo.toml

@@ -33,6 +33,7 @@ tonic = "0.10.2"
 wildmatch = "2.1.1"

 registry-types = { path = "../registry/types" }
+base64 = "0.22.1"


Can you place with the other versioned imports please?

morgsmccauley · 2024-05-28T04:48:48Z

block-streamer/src/bitmap.rs

+        Self {}
+    }
+
+    pub fn get_bit(&self, byte_array: &[u8], bit_index: usize) -> bool {


Suggested change

pub fn get_bit(&self, byte_array: &[u8], bit_index: usize) -> bool {

pub fn get_bit(&self, bytes: &[u8], bit_index: usize) -> bool {

Nit: bytes seems sufficient here?

morgsmccauley · 2024-05-28T04:49:56Z

block-streamer/src/bitmap.rs

+        (byte_array[byte_index] & (1u8 << (7 - bit_index_in_byte))) > 0
+    }
+
+    fn set_bit(&self, byte_array: &mut [u8], bit_index: usize, bit_value: bool, write_zero: bool) {


Why do we take both bit_value and write_zero? Would bit_value alone be sufficient?

This is necessary because we sometimes want to write the 0 over a 1, when we usually don't want to. Specifically when we are replacing one Elias Gamma encoding over another, as the length might be shorter (leaving extra 1's that should be zero). Technically we don't need it in the current code, but I ported it over as its exactly how we have it in the indexer logic.

morgsmccauley · 2024-05-28T04:50:19Z

block-streamer/src/bitmap.rs

+        }
+    }
+
+    fn get_number_between_bits(


What is number? Can we be more explicit?

Hmm basically we encoded a number as binary and are simply reading the binary value from a particular stretch of bits. Perhaps I can rename this to read_integer_from_binary? Even though all our functions deal with binary, maybe this can explicitly state this binary is utilized to build an integer.

morgsmccauley · 2024-05-28T04:52:18Z

block-streamer/src/bitmap.rs

+        &self,
+        byte_array: &[u8],
+        start_bit_index: usize,
+    ) -> anyhow::Result<usize> {


Suggested change

) -> anyhow::Result<usize> {

) -> Option<usize> {

Option seems more idiomatic here

morgsmccauley · 2024-05-28T04:56:36Z

block-streamer/src/bitmap.rs

+            return EliasGammaDecoded {
+                value: 0,
+                last_bit_index: 0,
+            };


Suggested change

return EliasGammaDecoded {

value: 0,

last_bit_index: 0,

};

return EliasGammaDecoded::default()

Could use Default here, but you'll need to derive it on EliasGammaDecoded

That's a good idea actually. It would definitely make the match look nicer too. It seems the default for usize is 0 anyway.

morgsmccauley · 2024-05-28T04:59:11Z

block-streamer/src/bitmap.rs

+    fn decompress_bitmap(&self, compressed_bitmap: &[u8]) -> Vec<u8> {
+        let compressed_bit_length: usize = compressed_bitmap.len() * 8;
+        let mut current_bit_value: bool = (compressed_bitmap[0] & 0b10000000) > 0;
+        let mut decompressed_byte_array: Vec<u8> = Vec::new();


Do we know the length of this upfront? Vec::with_capacity() would avoid unnecessary re-allocations.

If we knew capacity, maybe we could just use &[u8] 🤔

We don't know the size upfront. We need to decompress the EG to know how long the bit sequence is for each EG, and we can have many of them. We do know the upper bound, which is 86000 bits, since 1 bit per block and 86000 seconds in a day. But, I felt it was unnecessary to create 12KB byte arrays every time as we usually don't need that many.

morgsmccauley · 2024-05-28T05:01:31Z

block-streamer/src/bitmap.rs

+        decompressed_byte_array
+    }
+
+    fn merge_compressed_bitmap_into_base_bitmap(


What is the difference between base_bitmap and compressed_bitmap? Maybe this would be more obvious if we just had a merge function, and called decompress from the outside?

merge could even be defined on the Bitmap struct instead for further clarity

I think the confusion is that it is doing two different things. Decompression, and merging. Before decompression, it matters which bitmap is the compressed one as we want to ensure bits re written to the decompressed one. But if the bitmaps are both decompressed, this is no longer an issue.

I think the better way to go forward is creating a merge_bitmap function like you mentioned but keep it in BitmapOperator. Then we do a three step sequence in the public get_merged_bitmap function: decode, decompress, merge. This I think would be clear while retaining BitmapOperator as a stateless utility class. I'm a little confused with how a Bitmap struct function would perform merge. I imagine it would need to have a BitmapOperator internally. I think it might make things confusing regarding who owns these data operator functions.

If there's a more clear way to structure this class, I'm happy to rework it when you're back!

morgsmccauley · 2024-05-28T05:02:35Z

block-streamer/src/bitmap.rs

+        Ok(())
+    }
+
+    pub fn get_merged_bitmap(


Suggested change

pub fn get_merged_bitmap(

pub fn merge_bitmaps(

Nit: this seems more clear?

Sounds good! I was originally thinking of merge_compressed_bitmaps but maybe its not worth requiring someone to know they're compressed before calling the function? Especially since the argument type is Base64Bitmap which should only really be received form the graphQL query.

morgsmccauley · 2024-05-28T05:03:01Z

block-streamer/src/bitmap.rs

+    use super::*;
+
+    #[test]
+    fn test_getting_bit_from_array() {


Suggested change

fn test_getting_bit_from_array() {

fn getting_bit_from_array() {

Nit: test_ seems superfluous here

That's true. I'll reword the test names.

darunrs added 9 commits May 23, 2024 12:35

adding bitmap file

211a68d

test set bit function

d602c59

Implemented and tested all sub functions for decoding compressed bitmap

e38200f

Implemented decompress function

70c464f

Test decompress function

1f7ae3a

Fully implement and test all bitmap functions

4627cbf

Some small modifications

3b4708a

Add decoding of base 64 strings

4e44707

remove commented unused var

a59c622

darunrs marked this pull request as ready for review May 24, 2024 19:51

darunrs requested a review from a team as a code owner May 24, 2024 19:51

morgsmccauley approved these changes May 28, 2024

View reviewed changes

darunrs added 2 commits June 4, 2024 13:14

Address PR Comments

5bc2f93

Rename all byte_array to bytes

4798546

darunrs merged commit 989c432 into main Jun 4, 2024
8 checks passed

darunrs deleted the block-streamer-bitmap-operations branch June 4, 2024 20:21

darunrs linked an issue Jun 4, 2024 that may be closed by this pull request

Implement Bitmap Operations into Block Streamer #742

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Implement Block Streamer Bitmap Operations #747

feat: Implement Block Streamer Bitmap Operations #747

darunrs commented May 24, 2024 •

edited

Loading

morgsmccauley left a comment •

edited

Loading

morgsmccauley May 27, 2024

morgsmccauley May 28, 2024

morgsmccauley May 28, 2024

darunrs Jun 4, 2024 •

edited

Loading

morgsmccauley May 28, 2024

darunrs Jun 4, 2024 •

edited

Loading

morgsmccauley May 28, 2024

morgsmccauley May 28, 2024

darunrs Jun 4, 2024 •

edited

Loading

morgsmccauley May 28, 2024

morgsmccauley May 28, 2024

darunrs Jun 4, 2024 •

edited

Loading

morgsmccauley May 28, 2024

morgsmccauley May 28, 2024

darunrs Jun 4, 2024 •

edited

Loading

morgsmccauley May 28, 2024

darunrs Jun 4, 2024 •

edited

Loading

morgsmccauley May 28, 2024

darunrs Jun 4, 2024

	pub fn get_bit(&self, byte_array: &[u8], bit_index: usize) -> bool {
	pub fn get_bit(&self, bytes: &[u8], bit_index: usize) -> bool {

	fn test_getting_bit_from_array() {
	fn getting_bit_from_array() {

feat: Implement Block Streamer Bitmap Operations #747

feat: Implement Block Streamer Bitmap Operations #747

Conversation

darunrs commented May 24, 2024 • edited Loading

morgsmccauley left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

darunrs Jun 4, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

darunrs Jun 4, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

darunrs Jun 4, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

darunrs Jun 4, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

darunrs Jun 4, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

darunrs Jun 4, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

darunrs commented May 24, 2024 •

edited

Loading

morgsmccauley left a comment •

edited

Loading

darunrs Jun 4, 2024 •

edited

Loading

darunrs Jun 4, 2024 •

edited

Loading

darunrs Jun 4, 2024 •

edited

Loading

darunrs Jun 4, 2024 •

edited

Loading

darunrs Jun 4, 2024 •

edited

Loading

darunrs Jun 4, 2024 •

edited

Loading